Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Metadata #104

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

[WIP] Metadata #104

wants to merge 1 commit into from

Conversation

JeromeMartinez
Copy link
Contributor

As we can add content at the end of Parameters() in version 3, we don't need to wait for version 4 for adding metadata not impacting the frame bitstream.

I suggest to add frame size (width, height) in order to be independent from the container (and some container e.g. MPEG-TS does not have a place for such information), self describing, and also add metadata part (same reason: self describing and losing nothing even if there is a transwrapping).

The design has the goal to be able to skip metadata items not known by the decoder (i.e. if we add a new metadata id, we don't need to increase the version or micro_version number), as all metadata items are optional and not needed for the decoder (e.g. matrix coefs are just set as unknown if not present or not supported by the decoder, as it is nowadays). metadata_size would be 1 most of the time, I expect exceptions for char strings e.g. library name (it would be the count of unicode characters in this metadata)
Names of metadata come from Matroska.

Work in progress, for feedback and agreement about the way to do it, before I expand it.

@dericed
Copy link
Contributor

dericed commented Feb 7, 2018

The idea seems ok to me. Though I'm nervous of a large metadata registry ballooning the specification. I'd recommend citing an external source for the semantics, storage, and vocabulary of metadata values where possible and avoid defining them internally where possible. This approach does seem interesting for handling round-trips of DPX color data in DPX->FFV1-DPX scenarios. Also I suggest some presence or summary of this concept on the cellar working group.

@michaelni
Copy link
Member

There are streams where width/height changes mid stream. So a lossless format should not put w/h in a global header ideally. the version check is also bad as micro version depends on version. The check misbehaves after version 3. A better way is to combine the integer and fractional version parts before comparing. The metadata format has issues too, theres no way to store a key as a string for example. And you mix count and size in the pseudo code, one is unused the other looks unintendedly used twice

@JeromeMartinez
Copy link
Contributor Author

From feedback I understand you are not against the idea itself, the current state of the PR was more for a proposal of the idea before cleaning up the proposal so I would not spend too much time on it if it is not the right direction, so now I fix issues raised, asking for having validation on the technical part then I expand the text around modifications.

There are streams where width/height changes mid stream.

Most of containers do not support changes mid stream (AFAIK AVI, MKV and MP4 don't), and the ones supporting that are expanded with e.g. an intermediate layer providing the complete configuration record of the new stream configuration: a lot of things could change mid stream (bit depth, color space...) so I don't think that having an exception for width/height, without the same possibility for all other fields, make a lot of sense.

Anyway, if it is blocking I changed to width - 1 by width with 0 meaning "unspecified in the bitstream (provided by container)" (previous behavior, a muxer knowing that it may change would not fill that, a muser knowing that it will never change would fill it), so it does not prevent to use metadata part with exactly same "features" (here: width/height changes mid stream is still possible), but still arguing to not have exception for width/height but more specifying that when something changes the whole Configuration Record need to be transmitted.

The check misbehaves after version 3.

We didn't define fractional part, I am reluctant to add a complete definition for only that.
Changed to
"if (version == 3 && micro_version >= 5)" for v3 (v0-3) and "if ((version == 3 && micro_version >= 5) || version >= 4)" for v4.

theres no way to store a key as a string for example.

there is, metadata_size is for this exact purpose. It would be e.g. a count of characters (metadata_size characters), with 1 "ur" = 1 Unicode (UTF-32, no need to use UTF-8 as the range coder part does the work) character.

one is unused the other looks unintendedly used twice

Typo, fixed. At the same time, I changed metadata_size by metadata_size [ i ] for being clear about the fact the size is for metadata "i".

@dericed I am think to either just put the algorithm with no definition of any metadata id (we put them in a separate document), or list the metadata names and link to Matroska specs and centralize definition in Matroska, as Matroska is in CELLAR too.


Please review again (principle and technical part) and let's agree on the principle, then I expand it.

@retokromer
Copy link
Contributor

retokromer commented Feb 7, 2018

My 22 cents:

  • A change of size may be used for including e.g. HD content in an UHD production.
  • We often use change of LUTs when working on films made with different materials (e.g. negative-positive with included reversal excerpts).
  • As said some time ago, I suggest that, in case of discrepancy, the codec’s metadata should be preferred over the container’s metadata.
  • And I recommend to expand Матрёшка to allow changes in the middle of the stream.

@JeromeMartinez
Copy link
Contributor Author

A change of size may be used for including e.g. HD content in an UHD production.

Got it, just remarking that all could change (e.g. we could want to keep the same slice size, so we need 4x more slices in the UHD production, so we need to change that too mid stream).

As said some time ago, I suggest that, in case of discrepancy, the codec’s metadata should be preferred to the container’s metadata.

It is sometimes problematic, e.g. when frame rate is indicated in the bitstream (in that case the container frame rate is prefered as time stamps are from it). Anyway, I think it is a player policy and I don't see how to "force" that in a spec.

And I recommend to expand Матрёшка to allow changes in the middle of the stream.

Definitely something we need to work on, by e.g. permitting a completely new Configuration Record mid stream (to be discussed).

@retokromer
Copy link
Contributor

Anyway, I think it is a player policy and I don't see how to "force" that in a spec.

My suggestion is that a player SHOULD.

(At the Berlin symposium I proposed container before codec as well, then I changed my mind. I guess, I said that on the Cellar list. Not blocking at all on my side, I was just sharing some experience I gathered in our playgrounds.)

@JeromeMartinez
Copy link
Contributor Author

My suggestion is that a player SHOULD.

I added a line about it, for debate.

@michaelni
Copy link
Member

Please review again (principle and technical part) and let's agree on the principle, then I expand it.

about principle, I think its useful in principle but the currently proposed WIP changes feel rushed and hacked together ATM. iam in favor of it in principle but it must be improved but iam not sure what we will need and want will be possible before v4.

w=h=0 (provided by container) is considering that changing w/h should be at codec level not helping. Especially for most existing containers changing parameters need to be at the codec level. And even if a future container allows storing "per frame" w/h that still wont be enough i suspect. Theres more needed at the codec level.
So this w=h=0 look at container case increases complexity without fixing the issue.

string metadata keys are not supported in your proposal, you call it "metadata meaning" / "metadata_id", the list is finite and fixed, a user cannot store things that are not listed in this.

string values, theres a binary case supported. strings should probably be a type different from binary data. Also with strings the encoding needs to be specified. I assume that would always be UTF-8 this should be mentioned in the spec. The spec should mention what to do about invalid UTF8 encodings, this may also needs expansion of the security section.

Also if you want to make the stream independant of the container (by adding w/h), you also need timebase and timestamps or something equivalent.

The encoding in your proposal allows storing 1D arrays, it also lists matrixes, but how would a user defined matrix work as in "mymatrix" and 6 elements, that could be a 3x2 or 2x3 matrix?

And with string metadata comes the question of language. A title (which is metadata) is in a language, there may be multiple titles in multiple languages.

@JeromeMartinez
Copy link
Contributor Author

we will need and want will be possible before v4.

At least me needs it now (color metadata is often lost during trans-wrapping), more than optimization of the compression, and as it is not breaking v3 decoding I think it is valuable to have in v3. But definitely not needed for v3 standardization, just that I heard that there is no work on improving FFV1 in parallel to standardization so I propose patches for improving FFV1 with what is in my pending list (and for the moment, I see no need to have break in specs so no v4 stuff for this kind of change).

w=h=0 (provided by container) is considering that changing w/h should be at codec level not helping.

It would permit to have width/height in bitstream without preventing your use case. I don't understand the issue, please suggest a method for transporting width/height without issue for anyone.
Anyway, I didn't think it is not a subject to debate, so I remove these fields from this PR.

string metadata keys are not supported in your proposal, you call it "metadata meaning" / "metadata_id", the list is finite and fixed, a user cannot store things that are not listed in this.

I don't understand what is not working:
for each metadata, you get an id (you know what it is), and length (you know how long it is) and a type (you know how to skip data), and the way to skip a metadata is indicated in the pseudo-code (I could remove the list of ID and it would be same). what is missing preventing you to skip an unknown metadata?

strings should probably be a type different from binary data

Idea is to be very light, and there is no need to have more actually. I could still add a new type "char" but I don't see the reason we need to have that as "ur" is good for UTF-32.

I assume that would always be UTF-8 this should be mentioned in the spec.

As said, UTF-32 is possible, "ur" can handle 32-bit values.

The spec should mention what to do about invalid UTF8 encodings,

IMO it is UTF-8 (actually UTF-32) specification, Matroska does not define it too, and decoder can just skip the content.

Also if you want to make the stream independant of the container (by adding w/h), you also need timebase and timestamps or something equivalent.

timebase stuff is the role of the container, even MPEG-TS does that (making info about width/height mandatory in the codec as there is no place for that elsewhere).
All other compressed bitstream I know do transport width/height.
Any, let separate this issue, I'll open a dedicated PR for that later.

The encoding in your proposal allows storing 1D arrays, it also lists matrixes, but how would a user defined matrix work as in "mymatrix" and 6 elements, that could be a 3x2 or 2x3 matrix?

Would be in the definition of the metadata. I don't define any 1D array, just a suite of elements (with a dedicated count). it is generic and the goal is not to have complex stuff at the codec level. Matroska does same, having few type (signed, unsigned, string as UTF-8 as they don't have "Range Coder" stuff, and binary used for any 2D/3D array)

And with string metadata comes the question of language. A title (which is metadata) is in a language, there may be multiple titles in multiple languages.

I don't expect to transport title or any localized metadata, the idea is not to recreate Matroska, just transport video-related metadata, like a lot of other video format (H.264 VUI and SEI etc...).

@michaelni
Copy link
Member

string metadata keys are not supported in your proposal, you call it "metadata meaning" / "metadata_id", the list is finite and fixed, a user cannot store things that are not listed in this.

I don't understand what is not working:
for each metadata, you get an id (you know what it is), and length (you know how long it is) and a type (you know how to skip data), and the way to skip a metadata is indicated in the pseudo-code (I could remove the list of ID and it would be same). what is missing preventing you to skip an unknown metadata?

you seem not to understand what i mean. Let me give you a very silly example
key: "Jerome Martinez height"
value: "X cm"
nothing in the metadata meaning or id will match the key. This information thus cannot be stored.
the example is silly but iam sure you can find thousands of non silly things that will not fit unless a generic key/id string or equivalent is supported.

strings should probably be a type different from binary data

Idea is to be very light, and there is no need to have more actually. I could still add a new type "char" but I don't see the reason we need to have that as "ur" is good for UTF-32.

In an application that supports only string based metadata, binary will alway have to be converted to a string in some form base64 encoding or whatever. Now if there is no information on if data is a string or binary that would like become messy in an implementation.
Another issue is dealing with invalid utf8 encodings. For this the implementation needs to know what is a string. also presenting the data to the user requires to know if its binary or utf8.

Also if you want to make the stream independant of the container (by adding w/h), you also need timebase and timestamps or something equivalent.

timebase stuff is the role of the container, even MPEG-TS does that (making info about width/height mandatory in the codec as there is no place for that elsewhere).
All other compressed bitstream I know do transport width/height.
Any, let separate this issue, I'll open a dedicated PR for that later.

ok but what you say here about MPEG is not correct. mpeg-ts or PS does in general not store timestamps for every frame at the container level. The frame rate and frame duration from the codec layer has to be used to find per frame timestamps in MPEG-1/2 (this becomes more complex than this after MPEG-2) In this sense the codec layer frame rate represents the timebase, the frame/field repeat flags represent the timestamps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants